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Abstract 

We consider a single antenna narrowband multiple access channel in which users send training se- 
quences to the base station and scheduling is performed based on minimum mean square error (MMSE) 
channel estimates. In such a system, there is an inherent tradeoff between training overhead and the amount 
of multiuser diversity achieved. We analyze a block fading channel with independent Rayleigh distributed 
channel gains, where the parameters to be optimized are the number of users considered for transmission 
in each block and the corresponding time and power spent on training by each user. We derive closed form 
expressions for the optimal parameters in terms K and L, where K is the number of users considered for 
transmission in each block and L is the block length in symbols. Considering the behavior of the system 
as L grows large, we optimize K with respect to an approximate expression for the achievable rate, and 
obtain second order expressions for the resulting parameters in terms of L. The resulting number of users 
trained is shown to scale as 0( fi^rgp ), an d the corresponding achievable rate as O(loglogL). 



I. Introduction 

Multiuser diversity is a powerful technique for taking advantage of channel fluctuations in wireless 
communication systems EQ, ED, ED- In a cell with a large number of users experiencing independent 
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fading, high rates of communication can be obtained by scheduling only the users with the strongest 
channels. More specifically, in a multiple access channel (MAC) with an average total power 
constraint, symmetric fading statistics and full channel state information (CSI), ergodic sum capacity 
is maximized by allowing only the strongest user to transmit, with the power allocation given by 
waterfilling [1J. Furthermore, when the tail of the fading distribution satisfies certain conditions, 
the ergodic sum capacity scales as log log i^totab where i^totai is the total number of users in the 
system [2jQ In particular, this result holds for channel distributions with exponential tails, such as 
the Rayleigh distribution. 

In practical systems, full CSI is an unreasonable assumption, and channel estimates are instead 
obtained via training. This can require significant overhead in terms of both time and power, 
particularly when the number of users in the system is large. While there exists a large amount of 
literature on scheduling with training and limited feedback, most of it is for the broadcast channel 
(BC) rather than the MAC. In the BC, a common setup is for the base station to broadcast a training 
signal which allows each user to estimate their own channel, perform self-selection, and feed back 
information to the base station 0), 0. If the system is time division duplex (TDD) then such 
techniques are also possible in the MAC, as are fully distributed approaches 

Motivated by the fact that many wireless systems are frequency division duplex (FDD), we 
consider the case that the uplink and downlink channels differ and the users do not know their own 
channels. In this case, training sequences are sent from the users to the base station rather than vice 
versa. Given a finite coherence time, there is a limit to how long can be spent on training before 
the channel estimates become stale, and hence a limit on how many users can train the base station 
during this time. Consequently, the ergodic sum capacity remains bounded as the total number of 
users in the system grows large, and log log K tota i scaling is not achieved. 

A. Contributions and Previous Work 

In this paper, we consider a narrowband single antenna MAC with block fading and independent 
Rayleigh distributed channel coefficients. The block length in symbols is denoted by L. During each 

'if the average power constraint increases linearly with the number of users, an additional log if total term appears in the scaling. 
Since this power gain is not relevant to this paper, we assume a fixed average power constraint. 
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block, K users train the base station one at a time, after which the base station uses the channel 
estimates to perform scheduling. We aim to maximize a lower bound on the ergodic capacity with 
respect to the training time, training power and number of users considered for transmission. 

Our approach is similar to [j7], from which we borrow much of our notation. In [7], training time 
and power are optimized along with the number of subchannels trained in a single-user wideband 
system. This problem is one of choosing a number of parallel channels to train and transmit data 
over, whereas we consider the problem of training and user scheduling over a shared channel. 
While these problems bear some similarities, there are several key differences between the two. For 
example, in Q an arbitrarily large number of subchannels can be trained simultaneously without 
interference, whereas in the MAC, interference can only be avoided using orthogonal training 
sequences, leading to a significant loss in the temporal degrees of freedom. Similarly, after training, 
our setup does not allow for multiple users to transmit their data in parallel. 

A summary of our main contributions is as follows: (1) We derive exact expressions for the 
optimal proportion of both time and power spent on training in terms of K and L. (2) By analyzing 
the behavior of the system as K and L grow large with K = o(L), we obtain second order 
expressions for each of the parameters in terms of K and L. (3) We optimize K over an approximate 
expression for the achievable rate and obtain the resulting second order expressions for each of 
the parameters in terms of L, as well as the corresponding estimation error and achievable rate. 
Numerical results are used to show that these expressions approximate the optimal parameters well 
for finite values of L. 

Other related work is presented in [H]— [13]. In [8], the work of Q is extended to the multiuser 
wideband case with random training sequences, under the assumption that the number of users 
grows linearly with the block length. That is, optimization is done over the number of subchannels 
for a fixed number of users but not vice versa. Analysis of a multiuser narrowband system is 
performed in [9 |, but with a focus on the downlink channel. Specifically, the authors in [0 assume 
that each user can obtain perfect knowledge of their own channel, and that feedback to the base 

2 We use the term optimal to mean optimality with respect to the lower bound on capacity given in Section [TTJ, which we refer to 
as the achievable rate. 
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station requires a fixed number of bits per user. Optimization of training in a single-user MIMO 
system is presented in [10J and ifTD . In ifTOl the focus is on one-way communication where a 
training sequence is followed immediately by the data, while in [11 J the feedback of quantized 
CSI to the transmitter is considered. In [12], a multiuser FDD MIMO broadcast channel is studied, 
assuming zero-forcing beamforming with an equal number of users and base station antennas. This 
is extended to other settings in fl~3), including TDD and erroneous feedback. 

B. Paper Organization 

The remainder of the paper is organized as follows. We present the system model and formulate 
the problem in Section |TTJ We derive expressions for the optimal parameters in terms of K and 
L in Section [Till In Section [IV] we derive asymptotic expressions for the parameters in terms of 
L alone. A discussion of the asymptotic expressions is given in Section |VJ Numerical results are 
presented in Section [V]] and conclusions are drawn in Section IVIII 

The following notations are used throughout the paper, log(-) denotes the natural logarithm, and 
all rates are in units of nats per channel use. E[-] denotes statistical expectation, and = means 
"distributed as". The distribution of a circularly symmetric complex Gaussian (CSCG) vector with 
mean fi and covariance matrix S is denoted by CN(/i, S). | • | denotes magnitude, and || • || denotes 
Euclidean norm. Omxi denotes an M x 1 vector of zeros, and I M denotes the M x M identity 
matrix. For two functions f(L) and g(L), we write / = 0(g) if |/| < c\g\ for some constant c 
when L is sufficiently large, / = o(g) if ]hn.L-+ oa ^ = 0, / = &(g) if / = 0(g) and / ^ o(g), and 
f~gif lim^oo t = 1. 

II. System Model and Problem Statement 

We consider a single antenna FDD narrowband MAC with Jftotai users communicating with a 
base station. The transmitted data is assumed to be delay-insensitive. The channel is modeled as a 
Rayleigh block fading channel with L symbols per block and independent fades between blocks. 
Within each block, K users are considered for transmission. We assume that K tot3 x is sufficiently 
large so that any choice of K is permitted, provided that the total training time does not exceed the 
block length. The group of users considered varies between blocks using a deterministic selection 
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scheme known to both the base station and the users. For example, for fairness, the K users could 
be chosen in a round robin fashion or using a synchronized pseudo-random number generator. 
Under this setup, the system is described by 



K 



y = ^2 hkXk + z 



k=l 



where y is the L x 1 received signal vector, x fc is the L x 1 transmit symbol vector for user k, 
h k = CN(0,<T^) is the channel coefficient of user k, and z = CN(0l x i, ct^-l) is an L x 1 vector 
of CSCG noise samples. The transmitted symbols are subject to an average total power constraint, 



E 



1 - 

i£KI' 2 



< P- (1) 



k=l 

The users are assumed to be synchronized with their coherence blocks aligned in time, and each 
user is assumed to experience independent fading. We note that due to the symmetry of the setup, 
the power constraint could be replaced by a more realistic individual average power constraint of 
K p for each of the A' tota i users without affecting the analysis. However, we do not consider the 

"-total 

asymmetric case, which would require the consideration of issues such as fairness. 

Since the channel coefficients h k are unknown at the base station, the start of each coherence 
block is dedicated to training. One at a time, the K users under consideration transmit training 
sequences, each having an equal length denoted by T. The total number of symbols during training 
is denoted by T = KT. Each user transmits with power P T when sending their own training 
sequence, and remains silent while the other training sequences are sent. At the base station, a 
minimum mean square error (MMSE) channel estimate hf. is obtained for each user, with the 
corresponding channel estimation error denoted by e k = h k — h k . The variance of this error is given 
by 



a 2 



E r|e fc | 2 l =al-a\ = al\\- ^ TPt ) (2) 
h h h \ a 2 h TP T + a 2 z I 



where cr| = E[|/i fc | 2 ] is the variance of h k . This variance is the same for all users, since each user 
is assumed to use the same amount of time and power for training. 

Since the ergodic sum capacity of a fading channel with MMSE estimation is not yet known, we 
instead use a lower bound achieved by treating the channel estimation error as additive Gaussian 
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noise. In a general setup where multiple users may be scheduled with various powers, this achievable 
rate is given by lfl4l 



where a = ? is the fraction of the coherence time dedicated to training, /C is the set of users 
scheduled to transmit, and Pp.fc is the transmit power of user k during data transmission. While 
the cardinality of /C may in general be a function of the channel estimates, it is evident from © 
that for any given total power ^2, k&K Pd,u > the term inside the expectation in © is maximized 
by allowing only the user with the strongest \hk\ 2 to transmit. We therefore restrict our attention to 
the case that \IC\ = 1 and the base station schedules the user with the strongest channel estimate, 
maxfc = i ... ^ \ hk\ 2 , which will be denoted as \h*\ 2 . We assume that the feedback from the base station 
is error-free and takes up an insignificant fraction of the coherence time, and hence the selected 
user has L — T symbols available for data transmission. Under this scheme, the achievable rate is 
given by 



where Pd is the transmit power during data transmission. We assume that Pd is fixed between 
blocks and chosen such that the average total power constraint is met with equality. That is, 



While a fixed data transmit power is generally suboptimal, it achieves performance very close to 
optimal waterfilling even for moderate values of K [fT51l . while being simple to analyze and having 
a low feedback requirement. 

We aim to maximize C with respect to the fraction of time spent training a, training power 
Pt, and number of users K, subject to the power constraint (0Q). The optimal parameters will be 
denoted by a*, P£ and K*, and the corresponding achievable rate by C*. In general, each of these 
optimal parameters will be a function of K (e.g. a* = j; m ®X though this dependence is not 
made explicit. We remark that while optimizing a lower bound on capacity may not give exactly 




(3) 




(4) 



Pd 



P-aP T 
1 — a 



(5) 
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the same results as optimizing the true capacity, this problem still provides valuable insight into 
the tradeoff between multiuser diversity and training overhead. Spending more time and power on 
training will clearly reduce the estimation error, but at the expense of reducing the time and power 
left for data transmission. Similarly, considering more users in each coherence block will give a 
greater amount of multiuser diversity, but at the expense of the requirement of additional training. 

III. Optimization 

In this section we optimize the time and power spent on training for given values of K and 
L by applying similar techniques to Q to the MAC setting. We first evaluate the probability 
density function (PDF) of \h*\ 2 . The cumulative distribution function of \h*\ 2 is given by F(t) = 
(1 — exp(— \)) K , since \h*\ 2 is the maximum of K independent exp(-^-) random variables. Taking 

h h 

the derivative gives the PDF of \h*\ 2 , denoted by f(t) and given by 




/(*) = 

Using this expression, we write the achievable rate in two equivalent forms, 



oo 



C = (1 - a) I log ( 1 + _ J - _ ) fm (6) 





C (1 - n )2 ' ' j 



log [l + -\hl 



*|2 



X 



(7) 



where ex = cxPt, \h\\ is the maximum of K independent exp(l) random variables, and 

is the effective inverse signal to noise ratio. 

We begin by optimizing a for fixed values of e T and k\\ From ©, and writing TP T = ^€ T , a 2 
and a 2 h depend on a only through ex- Hence, from ©, optimizing a is equivalent to maximizing 
(1 — a) log(l + rz~) for some a, b > 0. This function is decreasing in a, hence we choose a to be 

3 While £t depends on a, it can be kept fixed as a varies by adjusting Pt accordingly. This corresponds to keeping the training 
energy fixed while varying the training time and power. 



January 21, 2013 



DRAFT 



s 



as low as possible while still ensuring all K users perform training. This is achieved by 

K 

a- = — (9) 



by setting T = 1 training symbol per userJj This is sufficient to obtain meaningful estimates of 
each of the K users' channels since the system is narrowband and each user has only one antenna. 

Next we optimize the training power. Instead of optimizing Pt directly, we optimize the propor- 
tion of power spent on training, denoted by e T and given by It = % ■ From © it is clear that C 
is decreasing in x for any fixed K. Hence the optimal value of €t, denoted by e T , minimizes x. 
Substituting © and © into © and setting T = 1 gives 

x= ( 1+ « \fi + |4Vi do) 



SeT J \ S(l — €t 

Pa 2 

where S = — t- is the overall signal to noise ratio (SNR). Taking the derivative gives 

Sx a 2 {l - 2e T ) - a{2Se 2 T - 2Se T + S - 2e T + 1) + Se T 



(11) 



5e T S 2 (l-e T ) 2 e 2 T 
Hence, setting = gives e T as the solution of the quadratic equation 

e 2 T S(l - 2a) + e T (2a(S + 1) - 2a 2 ) + a 2 - a(S + 1) = (12) 



the positive solution of which is 

- (a(5+l)-a 2 ) + y / a {S+S 2 )+(l-S-S 2 )a 2 -2a 3 +a 4 



5(l-2o) " ~ 2 

a 

We now show that for all a G (0, 1) this expression is in the range (0, 1) and therefore a valid value 

Sx 



r r { ' . (13) 



of e T - From (fTTI) it is straightforward to show that -p- approaches — oo as e T approaches from 



above, and oo as e T approaches 1 from below. Observing that Jp- is continuous for e T G (0, 1), 
it follows that = somewhere in this range. Since a 6 (0, 1) implies the coefficient to It in 
(fT2l) is positive, it is simple to show that (fT2l) has at most one positive solution, and that this is 
precisely the previously mentioned root of in the range (0, 1). 

4 We note that training K users one at a time with one symbol each gives the same performance as any orthogonal training 
sequences of length K using MMSE estimation. Other choices in which multiple users transmit simultaneously, such as Walsh- 
Hadamard sequences, may be more practical in systems with a peak transmit power constraint. 
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With a* and e T known in closed form for any given K, K* can be found using an exhaustive 
search over K E {1,2, L — 1}, since training any more than L — 1 users would leave no time 
for data transmission. This problem has O(L) complexity and can be solved efficiently even for 
large values of L. 

IV. Scaling 

While it is simple to find the optimal K for a given block length L numerically, finding it 
analytically appears to be difficult. In order to gain insight into the behavior of the optimal K, 
we analyze the asymptotic behavior of the system as L grows large. We remark that in practical 
systems the coherence time cannot be chosen, so studying the system behavior as L — > oo has 
practical limitations. However, we show via numerical results in Section [VT] that the asymptotic 
expressions give good approximations to the optimal behavior even for moderate values of L. 

We begin with a lemma regarding the asymptotic behavior of C* and a*. 

Lemma 1. As the block length L tends to oo, C* — > oo and a* — > 0. 

Proof: Suppose that the chosen parameters are K = L 1 / 2 and It = L^ 1 ^. Using a = -| we 
have a — > 0, and from ([TOt we obtain x ~ i. Substituting these into © gives C ~ E[log(l+S , |/i^| 2 )]. 
The right hand side of this asymptotic expression corresponds to the ergodic capacity of a MAC 
with Rayleigh fading, K users and zero estimation error, which implies C ~ log log K. Substituting 
K = L 1 / 2 gives C ~ log log L, which proves that C — > oo is achievable and therefore C* — > oo. 

To prove that a* — >■ 0, we note that even if perfect channel estimation is assumed with the 
only effect of training being a loss in temporal degrees of freedom, the achievable rate scales as 
(1 — a) log log K < (1 — a) log log L, where the inequality follows from K < L. Since C is a lower 
bound on this rate it is clear that a ^ o(l) is suboptimal, since we have shown that C ~ log log L 
is achievable. ■ 

Since a* — > by Lemma [fl meaningful expressions for the parameters are obtained by consid- 
ering only the lowest powers of a* = j-, or the highest powers of 4. Using this result, we give 
second order asymptotic expressions for e T and P T in terms of K and L. 
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Lemma 2. As L — > oo and K — > oo wzY/i X = o(L), f T and P T satisfy 



r^J s -±lJ!i- s -±l!£ + o (i4) 



S V L S L VL 



Pt = P\I—z-\I-*- n ~ + 0[\l- ) (15) 



and the corresponding estimation error satisfies 



Proof: Several steps of this proof will make use of j?— = 1 — a + 0(a 2 ) and \/l + a = l + 0(a) 
as a —7- 0. Using ([131 ), we obtain 

£t = g(1 ]_ 2a) (-<*(S + 1) + 0(a 2 ) + y/a{S 2 + 5) x ^1 + 0(a)) 

from which (HU) follows using a= f. Substituting CE1) into P* = = gives CLU). Finally, 
simplifying © as a 2 e = || (l - -^jp- + 0(-^)) and using CTJ) to evaluate = pv/^fiyf (l + 



f + °(f ))' ® follows. ■ 
In order to obtain expressions for each of the parameters in terms of L alone, optimization over 
K is required. However, C appears to be difficult to optimize over K directly. To simplify the 
analysis, we consider two approximations of C, given by 

C Ql = (l-^)log(l + ^log^ (17) 

C a2 = (!"£) log (l + S(l - Sy^/f ) log*.) (18) 

We denote the value of K which maximizes as K*. While we do not claim that K* and K* 
have the exact same behavior, the following lemma shows that asymptotically there is zero loss in 
the rate achieved by optimizing C al or C a2 instead of C. 

Lemma 3. Suppose a and It are chosen according to © and (fT3l respectively. If K is chosen to 
maximize any one of C, C al or C a2 then Hindoo |C — C al | = and lirn^oo |C — C a2 | = 0. 

Proof: See Appendix lAl ■ 
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As shown in the proof of Lemma |3j C a2 is obtained by substituting the asymptotic expressions 
for a* and into C al and performing asymptotic simplifications. We further justify the use of 
in the proof of the following lemma, where we show that the neglected asymptotic terms do 
not effect the resulting second order expression for K*. That is, if K* maximizes C al and K* 
maximizes C a2 then K* and K* a have the same second order expressions. 

Lemma 4. K* satisfies 

L = ^KilogK) 2 + 2K a (\ogK:)(\og\ogK:) + O (iCQoglogiC) 2 ) • d9) 

Proof: See Appendix El ■ 
We now have an expression for L in terms of K*, and expressions for the optimal parameters in 
terms of K and L. Combining these, the following theorem gives asymptotic expressions for K*, 
the optimal parameters when K = K*, and the corresponding estimation error and achievable rate. 

Theorem 5. K* is given by 

/r = _? L S(2S + 4)L\og\ogL / I \ 

a S+l (logL) 2 ^ (S + l) 2 (logL) 3 ^ \(\ogLyj- v ; 

Furthermore, with K = K* the optimal parameters are given by 

* _S 1 g(2S + 4)loglogL / 1 \ 

S + l(logL) 2 + (S+l) 2 (logL)3 + \(logL)*J ^ ; 

tx = hg~L + S + l (log Lf + ° (jhg~Lf 1 (22) 



Pt = log L - log log L + 0(1) (23) 

with corresponding estimation error and achievable rate, respectively, given by 

(a>\* = i — + ^ S ( S + 2 )^OgL f 1 \ 

1 el PlogL P (S+l) 2 (logL) 2 \(logL) 2 J K ' 

C* = loglogL + logS + o(l). (25) 

Proof: See Appendix O ■ 
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V. Discussion 

We make the following observations on the results of the previous section: 

• The number of users considered in each block, K, increases as Q( „ ^ £ , 2 ), so that the proportion 
of time spent on training, a, decreases as 0( ^ og 1 L ^ 2 ). It is unsurprising that K grows unbounded, 
as a larger L means there is more time available for training before the channel estimates 
become stale, and therefore more users can be trained to achieve greater multiuser diversity. 
The reason the proportion of time spent on training decreases to zero is that the loss in temporal 
degrees of freedom due to training is linear in K, while the multiuser diversity term is only 
double logarithmic in K. 

• The scaling of K is slower than the 0( logL1 ^ glogL ) growth when estimation error is not 
considered and the only loss due to training is in the temporal degrees of freedom 
Intuitively, this is because assuming perfect training with no power overhead means that training 
an extra user is considered to be more valuable than in the case of imperfect training, so the 
corresponding optimization problem gives a higher value for K. 

• The transmit power during training, Pt, increases as 0(log L), giving an estimation error which 
decreases as The reason that Pt grows unbounded is that for large L the proportion 
of time spent on training is small, so the instantaneous power can be large while still having 
little effect on the power remaining for data transmission. On the other hand, the proportion 
of power ex spent on training decreases as O(j^), so that asymptotically the loss of rate due 
to reduced data transmit power becomes negligible. 

• Constant factors of and appear in the expressions for K and Pt respectively. This 
indicates that when the SNR is low, it is preferable to spend the available power training 
fewer users accurately, rather than training a larger number of users inaccurately. This can be 
explained by the fact that C is obtained by treating the estimation error as additive noise, which 
incurs significant penalties when the training power is low. However, we remark that for small 
L and low SNR our scheme of indicating the strongest user and transmitting with constant 

5 The result in (9) was actually for the TDD downlink, but the problem formulation is very similar to the FDD uplink and gives 
the same growth rate for the optimal number of users. 
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power may be highly suboptimal, and alternative feedback schemes may achieve significantly 
higher rates (e.g. do not schedule any users for transmission unless the strongest estimated 
gain exceeds some threshold). 
• The achievable rate C scales as (log log L), unlike the 0(loglogi^ tota i) scaling of capacity 
regardless of block length in the case of full CSI. This suggests that the amount of multiuser 
diversity achieved in the fading MAC actually depends primarily on the block length, rather 
than the total number of users in the system. 

VI. Numerical Results 

In this section we present numerical results of the system. We use P — 1, a\ — 1 and a 2 z = 0.1, 
giving an overall SNR of S = 10. Figure \T\ shows the plot of C versus K with the block length 
fixed at L = 250. Even with this relatively small block length, only a small proportion of the time 

is spent training, with the optimal number of users at K* = 14. In Figure [2] we compare C with 

iC-C I iC-C I 

C al and C a2 by plotting the corresponding normalized differences (i.e. — ^~ al and — q ) for 
increasing L. As expected from Lemma |3l the differences tend to zero in both cases, albeit with 
slow convergence. 

The scaling of a*, P£ and K* are shown in Figures |3] @]and |5] respectively. The first and second 
order asymptotic expressions derived in Section [IV] are shown on the same axes (e.g. the plot of 
a in Figure |3] uses the expression in (1211 . giving the first order expression g^x (logi,) 2 anc ^ second 
order expression gfr (\ gL) 2 + (s+iy> niogT)^ ' Although the first order expressions have the same 
growth rate as the optimal parameters, the gap between the two is reasonable at practical block 
lengths. On the other hand, the second order parameters approximate the optimal parameters well 
even at moderate block lengths. 

VII. Conclusion 

We have analyzed a single antenna FDD narrowband MAC with training and user scheduling, 
using a Rayleigh block fading channel model with independent fading between users. Considering 
a lower bound on ergodic capacity, a closed form expression has been computed for the optimal 
proportion of power spent on training, and it has been shown that the optimal training sequence 
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length is T = 1 symbol per user. Second order asymptotic expressions have been obtained for the 
optimal parameters in terms of K and L. Considering the system behavior as L grows large, an 
approximate expression for the achievable rate has been optimized over K, and the resulting second 
order expressions for the optimized parameters have been obtained. 

There are several possible directions for further work. The orthogonal training scheme could be 
replaced by a more realistic scenario in which the users' coherence blocks are not aligned. Several 
different fading models could be considered, including asymmetric statistics and fading distributions 
other than Rayleigh. With multiple antennas at the base station it would become preferable to allow 
multiple users to transmit at once lfT6l , adding another level of complexity to the problem. Finally, 
an interesting problem would be the full analysis of the tradeoff between uplink and downlink rate 
with training and feedback. 



A. Proof of Lemma \3\ 

We split this proof into two parts, corresponding to the statements containing C al and C 



I) Expression for C al : From © and (HD we have e* T = B(v/§) = o(l) and a* = § = o(e* T ), 



values of K which maximize C and C al both grow unbounded for large L, i.e. K — > oo. Using these 
observations, we derive upper and lower bounds such that C < C al +o(l) and C > C al + o(l), using 
the techniques of [|9l Proposition 1]. Starting with the upper bound, we apply Jensen's inequality 
to © to obtain 



Appendix 




which we substitute into (flOl) to obtain x ~ -|> or more simply x 



0(1). We also note that the 




(26) 



From El, E[\hl\ 2 } = £ 



K i 
k=l fe' 



which is upper bounded by 1 + log(K + 1). Hence 





(27) 



Using x = 0(1) and K — > oo, it is clear that the second term of (|27t is o(l). 
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To obtain a lower bound on C, we use Markov's inequality, which states that E[X] > Pr(X >/?)/? 
for any non-negative random variable X and j3 > 0. Choosing X = (1 — a) log(l + ^|/ii| 2 ) and 
(3 = (1 — a) log(l + H) where t satisfies Pr(|/i^| 2 > t) = 1 — j^-^, the corresponding value of t 
is the unique solution to 



i — a 



-t\K 



1 - 



1 



logK 



It is easy to show that t = log K — log log log K satisfies this equation asymptotically, and therefore 
t = (logK — logloglogZT)(l + o(l)), or more simply t = log K + o(log K) . Hence the lower bound 
is 

1 . . / 1 . . \ 

(28) 



C > (1 - ^) (1 - a) log (l + \ ( log K + o(\og JQ)) 
C al + (l-a)log l + ( — — — ) +0 



' x + log Jf ; 



log is: 



(29) 



Again, using x = 0(1) and Jf — > oo, the second and third terms of (|29l are o(l). Combining the 
upper and lower bounds, it follows that lim^oo |C — C al | = 0. 
2) Expression for C a2 : Substituting © into (fTTT) gives 



if, 



Cal = ( X - T) l0 § 1 + 



oi - a: 



U e r p 1_6 7 



logK 



(30) 



We proceed to show that this can be reduced to (fl"8l . We define c\ 



and c 2 - ^Jj, 



so that gjf, = cu/f + O(j-) and (a*) 2 = c 2 W 4 + O(^). Substituting these expressions into ((30 



and applying a sequence of manipulations gives 

/ 

1 + - 



Crf = (1 - — ) lot 



err 



Wf + o(f) 



\ 



l-K/L 



logK 



<K/L+0(K/L) 



J 



(l- T )log 1 + - 



£ ' I erg , „ I K , r^(K^ 



logK 



(1 - log ^1 + - Q^f + 0(|)) logK 



(31) 



(32) 



(33) 
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where c 3 = c 2 + ^p*-, c 4 = + and we have used ^ = 1 — a + 0(a 2 ) as a — > 0. The value 



of C4 can be simplified to 2y sii, and the expression for C a2 follows by removing the O(jj) term. 
To prove that lim^oo |C — C a2 \ = is suffices to show that linix^^ \C al — C a2 | = 0, but this is a 
simple consequence of the fact that j- = o(l) and hence the O(^) term in <133T > only contributes 
an additive o(l) term to |C al — C^l. 

B. Proof of Lemma 

To show that the O(j^) term in (|33l ) is insignificant, we replace it with dj- for an arbitrary 
constant d, and show that the second order asymptotic expression for K* does not depend on d. 
We define the resulting expression as 

C a3 = log ^ + S(l - c \fj + d j;) l°g^ (34) 



where c = 2w Setting ^C a3 = gives the necessary condition for K to maximize C a3 , 



S(L -K) (2(1 -cJ% + df)-logK(cJf -2dKf)) [K K, 

±— 1 = — = log 1 + 5(1 -cJ- + d-) hgK 

2K(s{l-c^f + df)logK+V 1 v L 



(35) 

Hence, 

L(2 - c x ff log K) + o{L) + o(L x ff log K) 

V ^ (w . " V =logto g JS- + 0(l). (36) 

IK log K + o(K log A J 

It is not immediately obvious whether the dominant term in the numerator of the left hand side 
of (l36l) is 2L or —cL^J^logK. The following lemma shows that they in fact have the same first 
order asymptotic growth rate. 

Lemma 6. A necessary condition for K to satisfy (|36l) is J^logK = 0(1). Furthermore, for 
sufficiently large L there exists such a solution. 

Proof: We first note that K = 1 or K = L gives C a3 = 0, and for large L there always exist 
values 1 < K < L such that C a3 > 0. Combining this with the fact that C a3 is continuous in K, 
C a3 must have a local maximum and therefore (l35l must have a solution for large L. If ,J~^logK 
grows faster than 6(1), then the numerator of the left hand side of (|36l ) is negative when L is large, 
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which is not possible. If y ^ log X = o(l), it is easily verified that L ~ if (log if) (log log if), which 
contradicts the assumption that J~^\ogK = o(l). Therefore log if = 0(1) is necessary. ■ 
Next we define 



which can be rearranged to obtain 



p=Jj\ogK (37) 



L=\K(\ogK) 2 . (38) 
P 



Substituting (1371) and (1381) into (|36l) gives ^-(1 — 4f)logif ~ log log if, which is only possible if 
p ~ -. Therefore, L ~ ^ if (log if) 2 , giving a first order expression for L in terms of if. To obtain 
a second order expression, we set p = - + 5 and proceed to find a first order expression for 5. 
From (1371) and (|38T ). we obtain 

c 2 

L = — (l-c5 + 0(5 2 ))if(logif) 2 (40) 



Writing d35]) as 

L(l~lJf log if) + 0(iflogif) 



if log 1^(1 + OC^)) 



log log if + 0(1) (41) 



and substituting (1391) and (1401) . we obtain 

-c 3 5 



8 

c 3 log .ft' 



logif = loglogif + 0(1). (42) 



This implies that 5 V°f l ° g K K and hence, from (SB, 



c 2 



L = j if (log Kf + 2if (log if) (log log if ) + O (if (log log if) 2 ) . 



Substituting c = 2y concludes the proof. As previously mentioned, there is no dependence on 
d in the final expression. 
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C. Proof of Theorem \5\ 

For brevity, we write K instead of K* throughout this section. Several steps will make use of 
= 1 — a + 0(a 2 ) and log(l + a) = 0(a) as a — > 0. From (fl~9T ) we obtain 



and consequently 



log/, = log K | 1 + 2l °^ K + O(r^) I (44) 



log log I = log log /y- + ()( ). (45) 



From (l44l) and (145b we obtain 



(log L) 2 = (log Kf + 4 log K log log LT + 0(log K) (46) 
log log L _ log log LT , ^ A loglogK ^ 2 \ ^ 



log L log K \ log if 

Combining (|43l) and (|46l) gives 

£ g + 1 / 2 ( g + 41oglogK , 1 A 

(logL) 2 S V S'+l log AT MogATV 

which, when combined with (1471) , gives the expression for if* in (|20l) after solving for K and 

substituting O(^) = O(^). 

We now derive asymptotic expressions for each variable in terms of L after substituting K from 

(l20l) . The optimal value of a* given by ((IT]) follows immediately from (l20l ) and a* = j-. An 

alternate expression for ^ is then given by 

* S 1 ^ 1+ 2 5 + 4.o g .ogL +0(( ^lo | L )2) y (4g) 



L 5 + l(logL) 2 V S + l logL vv logL 

Taking the square root and using \/l + a = 1 + | + 0(a 2 ) as a — > 0, 



, i /' 1 + |+2Wog£ + 0(( !«iog£ n , m 



L V -S + l logL V 5 + 1 logL u logL 
The final expression for follows from substituting (1481 ) and (|49l ) into (fT4l . and similarly for 
(<j*) 2 and (|24l) . The expression for follows from substituting the expressions for a* and into 
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Pj, = -2^-. The expression for C* follows from substituting the optimal parameters into (Q~8j and 
using the result that |C — C^l = fr° m Lemma |3] 

References 

[1] R. Knopp and P. A. Humblet, "Information capacity and power control in single-cell multiuser communications," in IEEE 

International Conference on Communications, Seattle, WA, June 1995. 
[2] X. Qin and R. A. Berry, "Exploiting multiuser diversity for medium access control in wireless networks," in IEEE 1NFOCOM, 

San Francisco, CA, March 2003. 
[3] S. Sanayei and A. Nosratinia, "Exploiting multiuser diversity with only 1-bit feedback," in IEEE Wireless Communications 

and Networking Conference, New Orleans, LA, March 2005. 
[4] M. Sharif and B. Hassibi, "On the capacity of MIMO broadcast channels with partial side information," IEEE Transactions 

on Information Theory, vol. 51, no. 2, pp. 506-522, February 2005. 
[5] S. R. Bhaskaran, L. Davis, A. Grant, S. Hanly, and R Tune, "Downlink scheduling using compressed sensing," in IEEE 

Information Theory Workshop on Networking and Information Theory, Voros, Greece, June 2009. 
[6] X. Qin and R. A. Berry, "Distributed approaches for exploiting multiuser diversity in wireless networks," IEEE Transactions 

on Information Theory, vol. 52, no. 2, pp. 392-413, February 2006. 
[7] M. Agarwal and M. L. Honig, "Wideband fading channel capacity with training and partial feedback," IEEE Transactions on 
Information Theory, vol. 56, no. 10, pp. 4865^4-873, October 2010. 

[8] , "Spectrum sharing on a wideband fading channel with limited feedback," in CrownCom International Conference on 

Cognitive Radio Oriented Wireless Networks and Communications , Orlando, FL, August 2007. 
[9] A. Rajanna and N. Jindal, "Multiuser diversity in downlink channels: When does the feedback cost outweigh the spectral 
efficiency gain?" http://arxiv.org/abs/1102.1552. 
[10] B. Hassibi and B. M. Hochwald, "How much training is needed in multiple-antenna wireless links?" IEEE Transactions on 

Information Theory, vol. 49, no. 4, pp. 951-963, April 2003. 
[11] W. Santipach and M. Honig, "Optimization of training and feedback overhead for beamforming over block fading channels," 

IEEE Transactions on Information Theory, vol. 56, no. 12, pp. 6103-6115, December 2010. 
[12] M. Kobayashi, N. Jindal, and G. Caire, "How much training and feedback are needed in MIMO broadcast channels?" in IEEE 

International Symposium on Information Theory, Toronto, Canada, July 2008. 
[13] G. Caire, N. Jindal, M. Kobayashi, and N. Ravindran, "Multiuser MIMO achievable rates with downlink training and channel 

state feedback," IEEE Transactions on Information Theory, vol. 56, no. 6, pp. 2845-2866, June 2010. 
[14] M. Medard, "The effect upon channel capacity in wireless communications of perfect and imperfect knowledge of the channel," 

IEEE Transactions on Information Theory, vol. 46, no. 3, pp. 933-946, May 2000. 
[15] M. Mecking, "Resource allocation for fading multiple-access channels with partial channel state information," in IEEE 

International Conference on Communications, New York, NY, April 2002. 
[16] D. N. C. Tse, R Viswanath, and L. Zheng, "Diversity-multiplexing tradeoff in multiple-access channels," IEEE Transactions 
on Information Theory, vol. 50, no. 9, pp. 1859-1874, September 2004. 



January 21, 2013 



DRAFT 



20 



[17] H. A. David and H. N. Nagaraja, Order Statistics, 3rd Edition. New York: John Wiley and Sons, 2003. 



3 




1.8' 1 1 ' 1 

10 20 30 40 

Number of Users 



Figure 1. Achievable rate as a function of K with L — 250 
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Figure 3. Optimal values and asymptotic expressions for a 
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Figure 4. Optimal values and asymptotic expressions for Pt 
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Figure 5. Optimal values and asymptotic expressions for K 
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